c
option to generate object files from the
source files:ifort -c my_source1.f90 my_source2.f90 my_source3.f90
xiar rc my_lib.a my_source1.o my_source2.o my_source3.o
ifort main.f90 my_lib.a
ifort allows that. However, gfortran does not allow that. For example, the following definition is invalid in gfortran while valid in ifort.
program test
character(10), dimension(5) :: models = (/"feddes.swp", "jarvis89.swp", "jarvis10.swp" , "pem.swp", "van.swp"/)
end
To be valid in gfortran, use the following statement,
character(len=12), dimension(5) :: models = [character(len=12) :: "feddes.swp", &
"jarvis89.swp", "jarvis10.swp", "pem.swp", "van.swp"]
Note strings are fixed length. Text that is shorter is padded on right with spaces, while text that is longer is truncated.
SIZE>0
when not allocatedprogram allocator
double precision, dimension(:), allocatable:: x
allocate(x(10))
write(*,*) size(x),allocated(x) ! >> 10 t
deallocate(x)
write(*,*) size(x),allocated(x) ! >> 10 f
end program
The above program is valid. The size is only reliable if the array has been allocated.
fpe
: Allows some control over floating-point exception
handling for the main program at run-time. -fpe0
: abort
execution if all exceptions occur. Set in debug mode.
AVX
may be
better than AVX2
AVX2
doubles width of integer vector instructions to 256
bits, and adds FMA
.
Reference: Maybe in some cases AVX runs faster on an AVX2 platform
O2/O3 optimisation using Intel compiler 2017 update 2 gives different
results. It also occurs for NFS. So I change the OPT flags to
-O2 -xHost -fp-model precise
.
-fp-model precise
is critical and it does not affect NFS’s
performance.
Reference: Code Optimization: Special Compiler Options
-m, -xHost
-xHost
tells the compiler to generate instructions for
the highest instruction set available on the compilation host processor.
The specialized code generated by this option may only run on a subset
of Intel® processors. The -x
options enable additional
optimizations not enabled with options -m
.
-m
tells the compiler which features it may target,
including which instruction sets it may generate. Code generated with
these options should execute on any compatible, non-Intel processor with
support for the corresponding instruction set.
Options -x
and -m
are mutually exclusive.
If both are specified, the compiler uses the last one specified and
generates a warning.
So if you want to run programs on AMD processors, use
-mavx
.
Check Recommended Intel Compiler Debugging Options. Or check this PDF.
The official installer URL of Intel Parallel Studio XE 2020 is
http://registrationcenter-download.intel.com/akdlm/IRC_NAS/tec/16744/parallel_studio_xe_2020_update2_cluster_edition.tgz
.
This URL is given by Arch
Linux repo webpage for Intel Fortran Compiler. For other versions,
check it for updates.
gdb-ia
of 2019 versionThis was a known bug in some of the 2019 versions of our products. Please update your 2019 Intel products to Update 6 to get the fix. Or switch to the 2020 versions if available.
Intel MPI Benchmarks User Guide
For NFS, the most used MPI subroutine should be
MPI_isend, MPI_irecv
. These 2 subroutines are tested by
IMB-MPI1 Exchange
.
With the turbulence generation BC, MPI_bcast
is also
used heavily. It is tested by IMB-MPI1 Bcast
.
Example.
Main program file
include 'mkl_df.f90'
program main
use MKL_DF_TYPE
use MKL_DF
implicit none
integer, parameter :: wp = 8
integer, parameter :: xhint = DF_NON_UNIFORM_PARTITION
integer, parameter :: yhint = DF_NO_HINT
integer, parameter :: sorder = DF_PP_CUBIC
integer, parameter :: stype = DF_PP_NATURAL
integer, parameter :: bc_type = DF_BC_NOT_A_KNOT
integer, parameter :: scoeffhint = DF_NO_HINT
integer, parameter :: sitehint = DF_NON_UNIFORM_PARTITION
integer, parameter :: ndorder = 1
integer, dimension(1), parameter :: dorder = [0]
integer, parameter :: rhint = DF_MATRIX_STORAGE_ROWS
TYPE (DF_TASK) :: task
integer :: errcode, i, nx, nvar
real(wp), dimension(:), allocatable :: x, y, xi, yi, scoeff
continue
nx = 7
allocate(x(nx))
do i = 1, nx
x(i) = real(i**2)
end do
nvar = 1
allocate(y(nvar*nx))
do i = 1, nx
y(i) = x(i)**3
end do
write(*, *) x
write(*, *) y
allocate(xi(nx), source=0.0_wp)
allocate(yi(nx), source=0.0_wp)
do i = 1, nx
xi(i) = (real(i)/2.0_wp)**2+1.0_wp
end do
allocate(scoeff((nx-1)*sorder))
errcode = dfdNewTask1D( task, nx, x, xhint, nvar, y, yhint )
errcode = dfdEditPPSpline1D( task, sorder, stype, bc_type, scoeff=scoeff, scoeffhint=scoeffhint )
errcode = dfdConstruct1D( task, DF_PP_SPLINE, DF_METHOD_STD )
errcode = dfdInterpolate1D( task, DF_INTERP, DF_METHOD_PP, nx, xi, sitehint, ndorder, dorder, r=yi, rhint=rhint )
write(*, *) xi
write(*, *) yi
write(*, *) xi**3
end program
Makefile
ifort -mkl -o dfd_test dfd_test.f90
Results show that it matches the analytical value.
1.00000000000000 4.00000000000000 9.00000000000000
16.0000000000000 25.0000000000000 36.0000000000000
49.0000000000000
1.00000000000000 64.0000000000000 729.000000000000
4096.00000000000 15625.0000000000 46656.0000000000
117649.000000000
1.25000000000000 2.00000000000000 3.25000000000000
5.00000000000000 7.25000000000000 10.0000000000000
13.2500000000000
1.95312500000001 8.00000000000001 34.3281250000000
125.000000000000 381.078125000000 1000.00000000000
2326.20312500000
1.95312500000000 8.00000000000000 34.3281250000000
125.000000000000 381.078125000000 1000.00000000000
2326.20312500000
Use this type of analysis to check the instructions retired.
It requires the hardware event-based sampling collection. To enable it, you need to build and install the sampling driver. Check this Intel official webpage for reference.